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Summary 


A  previous  report,  “Tiie  Meaning  and  Utility  of  Confidence/’  DNA-TR-92-83, 
April  1992,  explored  and  discussed  statistical  methods  and  procedures  that  may  be 
applied  to  validate  the  survivability  of  a  complex  system  of  systems  that  cannot  be  tested 
as  an  entity.  It  described  a  methodology  where  Monte  Carlo  simulation  was  used  to 
develop  the  system  survivability  distribution  from  the  component  distributions  using  a 
system  model  that  registers  the  logical  interactions  of  the  components  to  perform 
system  functions. 

This  paper  discusses  methods  that  can  be  used  to  develop  the  required 
survivability  distributions  based  upon  three  sources  of  knowledge.  These  are  (1) 
available  test  results;  (2)  little  or  no  available  test  data,  but  a  good  understanding  of  the 
physical  laws  and  phenomena  which  can  be  applied  by  computer  simulation;  and  (3) 
neither  test  data  nor  adequate  knowledge  of  the  physics  are  known,  in  which  case,  one 
must  rely  upon,  and  quantify,  the  judgement  of  experts. 

This  paper  describes  the  relationship  between  the  confidence  bounds  that  can  be 
placed  on  survivability  and  the  number  of  tests  conducted.  It  discusses  the  procedure 
for  developing  system  level  survivability  distributions  from  the  distributions  for  lower 
levels  of  integration.  It  demonstrates  application  of  these  techniques  by  defining  a 
communications  network  for  a  Hypothetical  System  Architecture.  A  logic  model  for 
the  performance  of  this  communications  network  is  developed,  as  well  as  the 
survivability  distributions  for  the  nodes  and  links  based  on  two  alternate  data  sets, 
reflecting  the  effects  of  increased  testing  of  all  elements.  It  then  shows  how  this 
additional  testing  could  be  optimized  by  concentrating  only  on  those  elements  contained 
in  the  low-order  fault  sets  which  the  methodology  identifies. 
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SECTION  1 


INTRODUCTION 


This  paper  discusses  methods  for  developing  survivability  distributions  for 
components,  subsystems,  and  systems  based  upon  three  types  of  knowledge:  { 1 )  test 
data;  (2)  physical  principles;  and  (3)  engineering  judgment.  It  also  elucidates  the 
relationship  between  the  confidence  bounds  that  can  be  placed  on  component  and  system 
survivability  and  the  number  of  tests  conducted.  The  exposition  portrays  the  system 
survivability  distribution  for  a  simple  series  and  a  simple  parallel  system  comprised  of 
two  compionents  whose  survivability  distributions  for  a  specified  environment  have  been 
developed.  Finally,  a  Hypothetical  System  Architecture  (HSA)  communication  network 
is  conceptualized,  a  logic  model  for  its  performance  is  developed,  and  the  survivability 
distributions  for  the  composite  system  are  developed  for  two  alternate  data  sets.  The 
survivability  distributions  developed  and  employed  for  the  communication  nodes  and 
data  links,  given  the  imposition  of  a  specified  nuclear  environment,  reflect  the  effects  of 
increased  testing.  Attention  is  also  given  to  optimizing  ways  to  increase  the  confidence 
that  a  certain  survivability  is  achieved  or  to  improve  system  survivability  by 
prioritizing  which  components  or  subsystems  shoul  J  undergo  additional  testing  or 
simulation  to  cost  effectively  enhance  confidence  and  survivability. 

In  this  paper  we  assume  that  one  of  the  desired  and  required  results  from  any 
weapon  system  development  and  deployment  program  is  to  demonstrate  system 
survivability  with  high  confidence.  Indeed,  the  way  to  discriminate  between  alternate 
survivability  protocols  will  be  to  compare  estimates  of  the  quantitative  survivability  at 
fixed  confidence,  say  90%,  or,  alternatively,  to  compare  the  confidence  at  which  a 
specified  system  survivability,  say  0.95,  can  be  achieved.  The  purpose  of  this  paper  is 
to  describe  how  such  system-level  survivability-confidence  measures  are  developed 
from  available  knowledge  sources. 

As  discussed  in  the  references,  we  treat  survivability  (S)  -  the  probability  of 
component,  subsystem,  or  system  survival  -  as  a  random  variable  and  develop  its 
cumulative  probability  distribution.  With  developed  survivability  distributions,  we 
make  statements  about  the  certainty  or  probability  with  which  S  lies  within  specific 
intervals.  The  probability  that  random  variable  S  lies  within  a  specified  interval  is 
termed  confidence. 

In  this  exposition  we  have  confined  the  family  of  probability  distributions 
considered  to  beta  distributions.  The  reasons  for  doing  so  are  four-fold:  ( 1 )  the  domain 
of  the  beta  distribution  is  the  range  from  0  to  1,  the  proper  range  for  a  probability  (i.e.. 
S);  (2)  the  beta  distribution  can  take  an  infinite  number  of  shapes  by  varying  its  two 
parameters  to  accommodate  and  approximate  virtually  any  single  mode  probability 
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density  function;  (3)  the  beta  distribution  is  the  continuous  counterpart  to  discrete 
binomial  distributions  which  reflect  the  results  of  many,  if  not  most,  tests,  making  its 
use  both  direct  and  natural  in  many  situations;  (4)  an  intuition  about  the  meaning  of  the 
two  beta  parameters  that  relates  them  to  the  numbers  of  successes  and  failures  in  n 
binomial  tests  provides  a  method  for  developing  distributions  when  engineering 
judgment  must  be  employed. 


SECTION  2 


DEVELOPING  SURVIVABILITY  DISTRIBUTIONS 


To  quantify  and  place  confidence  bounds  upon  system  survivability  employing  the 
procedures  exemplified  herein,  and  more  fully  developed  in  Williams  et.  al.,  April 
1992  and  July  1992,  requires  developing  the  underlying  component,  subsystem,  and 
system  survivability  distributions  for  the  nuclear  environments  of  concern.  Obviously, 
few,  if  any,  nuclear  tests  of  actual  component  or  subsystem  hardware  will  be  conducted, 
and  none  will  be  conducted  of  complex,  spatially-distributed  systems  of  systems.  The 
basic  question  then  becomes,  how  can  the  survivability  of  complex  systems  be  quantified 
and  validated  from  the  knowledge  bases  available  or  p>ossible  to  be  acquired? 

To  be  credible,  survivability  distributions  for  both  components  and  systems  must 
rest  upon  available  knowledge  sources  -  test  data,  known  physical  laws  and  phenomena 
and  engineering  judgment.  In  this  section  wc  address  methods  for  deriving  quantitative 
survivability  distributions  from  three  knowledge  sources;  (1)  available  test  results;  (2) 
little  or  no  test  data,  but  known  physical  laws  and  phenomena  can  be  applied  by 
computer  simulation;  (3)  the  judgment  of  experts,  which  must  be  quantified  and  used 
because  neither  lest  data  nor  sufficient  knowledge  of  the  governing  physical  laws  are 
known. 


2.1  DEVELOPING  SURVIVABILITY  DISTRIBUTIONS  BASED  UPON 
TEST  DATA. 

Survivability  distributions  based  upon  actual  test  data  in  the  stressing 
environments  are  most  direct,  acceptable,  believable,  and  valid.  If  such  data  are 
available,  they  can  be  used  directly  with  minimal  subjective  judgments.  Standard 
statistical  manipulations  can  be  employed,  hypotheses  can  be  postulated,  and  inferences 
and  conclusions  can  be  drawn.  All  other  procedures  and  techniques  for  quantifying  the 
probabilities  of  survival  require  further  rationalizations,  assumptions,  and  judgments, 
often  compounded  to  many  levels. 

2.1.1  Using  Binomial  Test  Results. 

A  major  source  of  test  information  comes  from  binomial  or  attribute  test  results 
where  components  or  subsystems  are  subjected  to  one  or  more  environments,  singly  or 
in  combination,  and  the  device  response  is  binary  -  it  either  survives  or  fails. 

Treatment  of  binomial  test  data  is  highly  developed  and  commonplace. 

It  is  well  known  that  classical  confidence  bounds  on  the  true,  but  unknown, 
survivability  of  a  component,  subsystem,  or  .system  in  a  specified  environment  can  be 


developed  from  test  results  that  record  the  number  of  times  the  equipment  survived  in  n 
tests.  Hence,  using  well-known  procedures,  if  there  were  zero  failures  in  ten  trials,  we 
can  state  that  a  90%  lower  confidence  bound  (LCB)  on  the  survivability  of  the 
component  is  0.79.  Using  the  same  test  results  and  procedure,  but  choosing  another 
confidence  level,  say  50%.  we  develop  the  50%  LCB  to  be  0.93.  Tables  2-1  and  2-2 
record  the  50%  and  90%  LCBs  for  various  test  results  from  10,  and  from  100,  trials. 
By  choosing  various  confidence  levels  a  continuum  of  confidence  bounds  from  0  to 
100%  can  be  developed  from  the  results  of  any  binomial  test.  This  continuum  is  the 
cumulative  survivability  distribution  inferred  from  the  binomial  lest  data  available. 

There  is  a  general  relationship  between  binomial  LCBs  and  the  cumulative  beta 
probability  distribution  (U.S.  Army  Armament  Research  and  Development  Command, 
1981).  If  100*7  is  the  confidence,  n  the  number  of  binomial  tests,  f  the  number  of 
failures  experienced,  s  the  number  of  successes  (s  =  n  -  0,  and  b(a,P)  the  beta  density 
function  with  parameters  a,P,  then  the  associated  cumulative  beta  distribution 

y 

Y  =  jb  (s,f  +  l)dt 
0 


provides  the  binomial  LCB  at  the  lOOy  %  confidence  to  be  y.  The  associated  beta 
distribution  with  parameters  s,f-i-l  provides  the  binomial  LCBs  for  any  Y- 


Table  2-1.  90%  LCBs  for  different  test  results. 


Component  A 

Component  B 

Failures  -  10 
Trials 

90%  LCB 

Failures  -  100 
Trials 

90%  LCB 

3 

0.45 

30 

0.63 

2 

0.55 

20 

0.74 

1 

0.66 

10 

0.85 

0 

0.79 

0 

0.98 

Table  2-2.  50%  LCBs  for  different  test  results. 


Component  A 

Component  B 

Failures  -  10 
Trials 

90%  LCB 

Failures  -  100 
Trials 

90%  LCB 

3 

0.45 

30 

0.63 

2 

0.55 

20 

0.74 

1 

0.66 

10 

0.85 

0 

0.79 

0 

0.98 

2.1.2  Relationship  Between  Confidence  Bounds  And  Numbers  Of  Tests. 

Say  we  have  tested  component  A  ten  times  in  a  specific  environment  and  observed 
that  it  failed  (did  not  survive)  three  times.  The  mean  survivability  point  estimate  is  0.7, 
and  the  sample  variance  (the  second  moment  about  the  sample  mean)  is  0.021.  Now,  let 
us  ftirther  assume  that  we  continued  to  test  the  same  component  in  the  same 
environment  accruing  100  tests  with  30  failures  experienced.  Again,  the  mean 
survivability  point  estimate  is  0.7,  but  the  sample  variance  is  0.0021,  a  ten-fold 
reduction  over  the  results  from  only  ten  tests.  A  plot  of  these  survivability  distributions 
(the  binomial  LCBs)  in  Figure  2-1  reflects  the  reduced  variability  in  the  survivability 
estimates  from  increased  testing  resulting  in  higher  LCBs  at  higher  confidence.  At  90% 
confidence  (10%  cumulative  probability)  the  LCB  with  10  tests  is  0.45,  whereas  it  is 
0.63  with  100  tests  for  results  yielding  the  same  70%  proportion  of  successes. 
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SURVIVABILITY  (x) 


Figure  2-1.  Survivability  distributions, 
(effects  of  additional  te.sting  for  one  component) 


CUMULATIVE  PROBABILITY  ( PROB  X 


Similar  results  occur  for  other  sets  of  test  results  as  shown  in  Figure  2-2. 
Increased  testing  reduces  the  variance  of  the  distributions  essentially  proportional  to  the 
increased  numbers  of  tests.  It  also  increases  the  LCBs  at  higher  confidence  levels  for 
data  sets  having  the  same  success  proportions. 

Figure  2-3  records  the  survivability  distributions  that  result  from  zero  failures  in 
n  trials  for  n  =  100,  500,  and  1000  trials. 

2.2  DEVELOPING  SURVIVABILITY  DISTRIBUTIONS  BASED  ON 
KNOWN  PHYSICS. 

The  Defense  Nuclear  Agency,  its  contractors,  and  other  scientists  studying  the 
effects  of  nuclear  environments  on  structures,  electronics,  equipment,  and  personnel 
have  gathered  data,  conducted  tests,  explored  the  phenomena,  and  developed  a  solid 
understanding  of  many  underlying  nuclear  effects  and  mechanisms  (upset,  da.mage, 
noise  generation).  These  scientists  have  developed  algorithms  relating  damage  to  stress 
-  radiation,  overpressure,  temperature,  and  shock.  In  many  cases  these  algorithms  have 
been  checked  with  results  from  tests  and  incorporated  into  various  nuclear  effects 
damage  codes.  Hence,  given  the  signature  and  location  of  a  nuclear  detonation,  the 
expected  damage  upon  equipment,  structures,  and  personnel  can  be  simulated  with 
acceptable  accuracy. 

In  cases  where  there  are  no  directly  applicable  test  data,  the  survivability 
distributions  for  components  and  subsystems  can  be  developed  from  the  simulation  data 
from  validated  nuclear  effects  codes  as  an  alternate  data  source.  When  uncertainties  are 
addressed  in  these  simulations,  the  results  can  be  treated  as  those  from  actual  tests  as 
above.  Again  the  results  are  binomial  -  the  equipment  either  survives  or  fails. 
Consequently,  the  simulation  results  can  be  used  to  develop  survivability  distributions 
just  as  were  test  results  as  discussed  in  Section  2.1  above. 

2.2.1  Lognormal  Stress-To-Failure  Distributions. 

It  has  been  observed  that  on  occasion  the  probabilities  of  equipment  failure  are 
distributed  lognormally  with  stress,  e.g.,  electronic  part  failures  with  radiation.  Where 
experience  has  established  this  relationship,  use  of  the  lognormal  distribution  implies 
known  physics.  With  relatively  few  test  samples,  one  can  establish  the  parameters  for 
the  appropriate  approximating  failure  distribution. 
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CUMULATIVE  PROBABILITY  (PROB  X<=x) 


SURVIVABILITY  (x) 


Figure  2-3.  Survivability  distributions  for  zero  failures  in  n  trials. 

(n=100,  500,  1000) 
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The  lognormal  distribution  is  common  and  natural  for  describing  data  that  varies 
by  factors.  It  relates  to  the  application  of  the  normal  distribution  for  data  that  varies  by 
additive  or  subtractive  increments.  For  the  lognormal  distribution  the  exponent  (or 
logarithm  of  the  raw  variable)  varies  normally. 

A  common  approach  (U.S.  Atomic  Energy  Commission,  1974  and  Institute  of 
Electrical  and  Electronics  Engineers,  Inc.,  1976)  is  to  consider  that  a  90%  confidence 
interval  from  the  5%  (XL)  and  95%  (Xu)  bounds  of  the  distribution  can  be  defined 
from  the  median  Xq.  In  this  case  XL  =  XQ/f  and  Xu  =  Xo*f  so  the  extent  of  variation 
between  Xl  and  Xu  is  where  f  is  the  factor  of  variation  between  the  5%  and  95% 
bounds.  If  f  =  10,  for  example,  the  range  of  variation  in  this  interval  is  two  orders  of 
magnitude  (10^).  If  f  =  5,  the  range  of  variation  is  from  .2  to  5  (5^),  etc. 

With  p,  and  c  representing  the  mean  and  variance  of  the  normal  distribution 
which  the  logarithms  of  the  original  variable  follow,  then  the  mode,  median,  mean,  and 
variance  of  the  original  lognormally  distributed  variable  are  defined  in  Figure  2-4. 


Log  Normal  Properties 

Frequency  (density)  function:  f(x)  =  t  exn 

«rx 

Mode  (most  probable  value):  =  e^— o* 

Median  :  X0.5  = 

Median  (in  terms  of  _ _ 

Upper  and  Lower  bounds:  Xq.s  ^XjjXl 

Mean:  x  =  e^+<'*/2 

Variance;  y  s  glfj+a^  {e°  —1) 


Figure  2-4.  Lognormal  density  function. 
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The  cumulative  lognormal  failure  distribution  of  Figure  2-5  depicts  the 
developed  relationship  between  the  stress  level  and  the  probability  of  failure  (Pf).  For 
a  specific  stress  there  is  an  associated  Pf  point  estimate.  The  probability  of  survival 
point  estimate  Ps  =  1  -  Pf-  To  conduct  our  Monte  Carlo  sampling  to  determine 
survivability  distributions  for  higher  levels  of  integration,  we  need  a  survivability 
distribution  for  this  component,  not  just  a  point  estimate. 


LOGARITHM  OF  STRESS  LEVEL  (x) 


Figure  2-5.  Lognormal  failure  distribution. 

We  know  that  there  are  uncertainties  in  developing  the  lognormal  distribution 
above  due  to  sparse  data,  the  assumption  of  the  distributional  form,  variations  in  parts, 
the  effects  of  integration,  etc.  Consequently,  if  we  let  the  Ps  point  estimate  derived  as 
above  be  the  mean  (m)  and  specify  its  variance  (v)  for  this  stress  level  from  engineering 
judgment  (or  from  repeated  sampling  at  the  specified  stress),  the  beta  parameters 


11 


specifying  this  component's  survivability  distribution  for  the  stress  level  of  interest  can 
be  derived  from  the  relationships 

m  =  a/(a+P), 

V  =  ap/((a+p)2(a+3+l). 

Hence,  given  m  and  v,  the  beta  parameters  are: 

3  =  (m/v)(l-m)2  -  (1-m), 

a  =  (Pm)/(l-m). 

For  example,  if  the  mean  Ps  for  a  specified  stress  were  found  to  be  0.95  and  we 
estimate  the  variance  to  be  0.001,  the  corresponding  beta  parameters  are 

P  =  2.325, 


a  =  44.175. 

From  Section  2.1  above  this  is  roughly  equivalent  to  the  results  from  one  failure 
in  45  trials  at  this  stress  level. 

Instead  of  specifying  the  variance  directly,  an  alternate  approach  to  developing 
the  requisite  beta  parameters  is  to  specify  the  number  of  successes  (s)  in  n  trials  that 
produce  the  Ps  estimate.  Then  a  =  s,  P  =  n-s+1.  Using  this  approach,  if  Ps  were  found 
to  be  0.95  and  we  consider  that  there  is  a  large  uncertainty,  we  may  postulate  that  Ps  = 
19/20,  19  successes  in  20  trials.  Hence,  a  =  19,  p  =  2.  In  this  case  the  mean  is  0.95  and 
the  variance  is  0.0039. 

Postulating  a  smaller  variance,  we  increase  the  number  of  tests  and  let  the  ratio, 
which  equals  Ps,  remain  constant.  If  we  let  n  =  40,  then  Ps  =  38/40.  Consequently, 
a  =  38,  P  =  3  and  v  =  0.0016. 

So,  by  supplying  some  measure  of  the  uncertainty  in  the  survivability  point 
estimate  developed  from  the  lognormal  failure  distributions,  the  requisite  beta 
survivability  distributions  can  be  explicitly  defined. 


12 


2.2.2  Nuclear  Radiation  Stress*To<Fai!ure  Survival  Margins  and 
Failure  Probabilities. 

Methods  have  been  developed  for  extrapolating  estimates  of  the  probability  of 
survival  of  electronic  systems  operating  in  nuclear  radiation  environments  from  small 
sample  sizes  of  components  exposed  to  increasing  radiation  stress  until  failure.  A 
report  (Jordon,  1989)  of  work  funded  by  the  Navy  Standard  Missile  Program  Office, 
PMS-422,  the  Theater  Nuclear  Program  Office,  PMS-423,  the  Office  of  Naval  Research 
(ONR),  and  the  American  Society  for  Engineering  Education  (ASEE)  developed  a 
lognormal  radiation  stress-failure  distribution  based  on  the  results  of  numerous  tests  on 
semi-conductors. 

The  basic  concept  is  that  there  are  relationships  between  the  threshold  of  failure 
(TF),  the  lowest  level  of  radiation  stimulus  that  produces  a  response  that  can  cause  part 
or  system  failure,  and  the  specification  (SPEC).  The  survival  margin  (SM)  is  the 
dimensionless  arithmetic  ratio  of  the  TF  (which  corresponds  approximately  with  the 
90%  Ps  at  the  90%  confidence  level)  and  the  SPEC  level;  SM=TF/SPEC. 

Figure  2-6  portrays  a  90%  confidence  lognormal  density  function.  The  SPEC 
level,  the  TF,  and  the  geometric  and  arithmetic  means  are  noted.  In  this  figure,  the 
area  under  the  curve  to  the  left  of  TF  is  0. 1 ;  that  is,  90%  of  the  time  a  higher  radiation 
level  would  be  required  to  cause  failure,  or  the  probability  is  0.9  that  the  component 
would  not  fail  up  to  the  IF  radiation  level.  The  variance  estimates  for  the  lognormal 
distributions  are  either  inferred  from  engineering  judgment  or  obtained  from 
experimentally  measured  data  assuming  a  lognormal  failure  distribution. 

In  the  samples  taken  the  parts  are  exposed  to  increasing  levels  of  radiation  until 
failure  occurs.  If  ri  are  the  radiation  levels  at  which  part  failures  occur  for  each  of  n 
samples,  then  the  arithmetic  mean  (AM)  of  radiation  failure  is 

1  n 

AM  =  -!- 

n  i=i 

The  logarithmic  mean  (LM)  is  defined 

LM  =  -  i  In  (r.) 
n  i=l 

and  the  geometric  mean  (GM)  as  GM  =  e^M  xhe  authors  state  that  the  GM  is  a  better 
average  measure  of  the  mean  failure  level  for  a  lognormal  failure  distribution  than  is 
the  AM,  but  using  either  provides  an  estimate  for  the  Ps  at  the  50%  confidence  level. 
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NUMBER  OF  FAILURES 


SPEC  •  SPECIFICATION  LEVEL 
TF  .  THRESHOLD  OF  FAILURE 
CM  •  GEOMETRIC  MEAN 
AM  •  ARITHMETIC  MEAN 
P,  •  FAILURE  PROB.ATTHRESHCLO 
P'!  •  FAILURE  PROB.  AT  SPEC  LEVEL 
P,  •  SURVIVAL  PROB.  AT  THRESHOLD 
Pi  •  SURVIVAL  PROB.  AT  SPEC  LEVEL 


90%  CONFIDENCE  CURVE 
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2.3  DEVELOPING  SURVIVABILITY  DISTRIBUTIONS  FROM 
ENGINEERING  JUDGEMENT. 

Procedures  for  developing  survivability  distributions  based  on  test  data  and 
knowledge  of  physics  were  discussed  in  Sections  2.1  and  2.2  above.  When  neither  of 
these  data  sources  exist,  survivability  distributions  can  still  be  developed,  albeit  with  less 
credence  and  employing  wholly  subjective  judgments. 

In  these  cases,  we  invite  knowledgeable  experts  to  provide  their  judgment  of  the 
survivability  of  the  components  or  subsystems.  (This  is  the  Delphi  method:  see  Institute 
of  Electrical  and  Electronics  Engineers,  Inc.,  1977,  Appendix  B,  p.  25.)  This  can  be 
done  in  various  ways.  One  approach  is  to  ask  the  experts  to  provide  estimates  of  the 
stress  levels  that  produce  5%  and  95%  Pfs,  Given  this  information  we  proceed  as  in 
Section  2.2.1.  An  alternate  approach  is  to  ask  them  to  specify  the  beta  parameters  for 
the  survivability  distributions  directly.  If  several  experts  provide  data,  it  will  be 
important  to  perform  excursions  employing  each  data  set  to  ascertain  the  system 
sensitivities  to  these  inputs.  One  may  also  average  the  means  and  variances  of  the 
survivability  distributions  provided  by  the  experts  to  develop  a  composite  representative 
distribution. 

Because  we  have  less  credence  in  this  approach,  it  will  only  be  employed  when 
there  is  no  alternative.  When  employing  it,  the  uncertainties  understandably  will  be 
large.  However,  the  results  may  still  be  acceptable  for  systems  with  numerous 
redundant  sets  of  equipment.  The  important  point  is  that  even  in  the  absence  of 
statistically  significant  test  data  and  lacking  understanding  of  the  physics  of  failure, 
survivability  distributions  for  components  and  subsystems  can  still  be  developed.  The 
fact  that  the  survivability  distributions  will  exhibit  large  uncertainties  will  be 
highlighted  and  the  system  effects  quantified.  This  fact,  and  other  procedures 
mentioned  below,  will  alert  decision  makers  where  additional  data  and  research  are 
needed  and  will  help  them  prioritize  the  allocation  of  resources  to  obtain  the  necessary 
information  to  increase  both  confidence  and  survivability  measures. 
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SECTION  3 


SYSTEM  SURVIVABILITY  DISTRIBUTIONS  FOR 
SIMPLE  SERIES  AND  PARALLEL  SYSTEMS 


We  now  develop  the  system  survivability  distributions  for  a  simple  series  and  a 
simple  parallel  system  comprised  of  two  components  A  and  B  (See  Figures  3-1  and  3- 
2).  We  chose  the  survivability  distribution  for  component  A  as  being  that  derived  from 
test  results  of  3  failures  in  10  trials;  that  for  component  B  as  from  the  test  results  of  30 
failures  in  100  trials. 


Figure  3-1.  Simple  series  system. 


Using  the  methodology  developed  in  Williams,  1992,  we  randomly  sampled  each 
of  the  component  A  and  component  B  distributions  (assuming  statistical  independence) 
1000  times  and  developed  the  survivability  distributions  for  the  composite  series  and 
parallel  systems.  The  results  are  shown  in  Figure  3-3. 

Note  that  the  results  for  the  series  system  are  always  lower  than  those  for  either 
of  the  two  components  alone  because  both  components  must  sur\'ive  for  the  system  to 
survive.  Conversely,  for  the  parallel  system  where  the  survival  of  either  component 
results  in  system  survival,  the  system  survivability  is  always  better  than  ^hat  for  either 
component.  Variability  is  also  reduced  for  the  parallel  system  (more  vertical 
distribution). 

For  these  simple  system  configurations  and  employing  the  distributions  postulated 
for  components  A  and  B,  we  are  90%  confident  that  system  survivability  exceeds  0.30 
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CUMULATIVE  PROBABILITY  (PHOB  X  <=  *  ) 


for  the  series  system,  but  that  it  exceeds  0.82  for  the  parallel  system  for  this 
environment.  The  redundant  configuration  buys  a  lot  of  assurance  in  surv  ivability  for 
this  case.  It  does  so  in  general  as  well. 


0  0.2  0.4  0.6  0.8  1 

SURVIVABILITY  (x) 


Figure  3-3.  Survivability  distributions  for  two  components  in  series  and  parallel 
configurations. 
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SECTION  4 


DEVELOPING  THE  SURVIVABILITY  OF  A  REALISTIC  SYSTEM 


Survivability  protocols  are  developed  to  provide  assurance  that  large  systenns 
meet  survivability  goals  and  requirements.  The  measure  of  the  relative  merit  of 
alternate  protocols  is  the  developed  survivability  at  specific  confidence  or,  alternatively, 
the  associated  confidence  at  fixed  high  survivability.  Consequently,  to  be  able  to 
discriminate  between  the  merit  of  alternate  protocols  requires  ihai  system  survivability 
distributions  be  developed  for  each.  As  the  prior  discussion  has  highlighted,  the  amount 
of  test  data  available,  and  the  system  configuration  both  play  important  roles 
influencing  the  resultant  system  survivability  distributions. 

It  is  also  important  to  recall  that  the  entire  discussion  to  this  point  has  been 
conducted  assuming  that  the  data  on  the  several  components  were  for  the  same 
environment  and  that  the  component  responses  were  assumed  to  be  statistically 
independent.  To  extrapolate  such  results  to  other  environments,  or  to  begin  to 
address  statistical  dependencies  among  component  responses,  or  both  simultaneously, 
requires  either  additional  testing  at  higher  levels  of  integration  in  the  right 
environment,  or  the  use  of  subjective  judgment  to  develop  the  survivability 
distributions. 

Assuming  that  appropriate  survivability  distributions  can  be  developed  for  the 
constituent  subsystems,  we  develop  the  model  and  the  system  survivability  distribution 
for  the  conceptual  HSA  communication  network  of  Figure  4-1.  This  network  is  further 
described  in  Table  4-1. 

For  the  purpose  of  our  example,  we  postulate  the  following: 

1.  There  are  3  Space-Based  Launch  Sensors  (SBLSs),  each  of  which 

communicates  with  each  of  the  Space-Based  Communication  Nodes 

(SBCNs). 

2.  There  are  a  total  of  three  4  SBCNs,  each  of  which  communicates  with  each 

Ground-Based  Communication  Node  (GBCN). 

3.  There  are  4  GBCNs,  each  of  which  communicates  with  the  North 

American  Air  Defense  Command  (NORAD).  (GBCN  #4) 
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4.  There  are  3  Space-Based  Mid-Course  Trackers  (SBMCTs),  each  of 
which  communicates  with  each  SBCN. 

5.  Ail  other  equipment  and  communication  links  are  single  entities. 
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iiurc  4-1.  HSA  communications  network. 


Table  4- 1 .  HS A  communication  network. 
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Figure  4-2  is  the  GO  Logic  Model  of  the  HSA  communication  network.  Each  of 
the  elements  of  the  model  represent  equipment  for  which  associated  survivability 
distributions,  given  exposure  tc  a  postulated  nuclear  detonation  at  a  specified  location 
with  a  specified  signature,  have  been  developed.  Each  of  the  elements  of  the  HSA 
communication  network  is  represented  with  a  pair  of  "type"-"kind"  numbers  separated 
by  a  hyphen.  The  "type"  number  captures  the  logical  essence  of  the  component  and 
refers  to  one  of  17  defined  logical  operators  in  the  GO  methodology  (Gately,  et  al., 
1983).  A  type  1  operator  represents  the  logical  operation  of  an  equipment  which  either 
performs,  or  fails  to  perform,  its  function  given  a  proper  input  or  stimulus.  The 
associated  "kind"  number  is  simply  the  sequential  number  in  an  array  that  references 
the  probabilities  with  which  that  component  takes  its  several  operational  states  -  e.g., 
good,  bad,  premature.  Arrows  depict  the  recorded  input  and  output  "signals"  (a 
carryover  from  electrical  schematics)  that  are  really  discrete  random  variables  that  take 
pre-defined  values  representing  success  or  failure.  For  this  example,  the  random 
variables  take  only  two  values  -  0  for  success,  or  1  for  failure.  In  Figure  4-2,  for 
example,  the  success  event  that  Space-Based  Launch  Sensor  Number  1  (labeled  as  type- 
kind  1-1)  operates  properly  and  sends  a  signal  (1 1)  to  Space-Based  Comm  Node 
Number  1  (labeled  1-7)  via  Comm  Link  A  (labeled  1-4)  when  the  model  is  exercised,  is 
expressed  as  the  event  that  signal  1 1  takes  value  0,  or  simply  I  iQ- 

Elements  of  the  model  that  simply  represent  logical  operations  -  "and"  gates,  "or" 
gates,  or  "m  out  of  n"  gates  have  no  associated  probabilities,  and,  consequently  no 
associated  "kind"  numbers.  For  example,  the  type  2  operators  represent  "or"  gates  and 
the  type  10  operators  represent  "and"  gates  in  Figure  4-2.  "Signal"  500  near  the  lower 
left  comer  of  Figure  4-2  represents  the  logical  output  for  the  HSA  communication 
network.  It  represents  the  system  survivability  whose  distribution  will  be  developed. 

The  postulated  survivability  distributions  for  the  constituent  subsystems  and 
communication  links  are  all  identical  beta  distributions  with  the  same  parameters.  For 
simplicity  we  gave  all  components  the  same  survivability  distributions  and  explored  the 
system  survivability  distributions  for  two  cases.  In  the  first  case  all  components  have 
beta  parameters  100,1  reflecting  the  fact  that  there  were  no  failures  in  100  trials  on  any 
component  (subsystem).  In  the  second  case  the  parameters  are  200,1  (200  tests  with  no 
failures  on  any  component).  The  two  output  distributions  are  shown  in  Figure  4-3. 
(These  results  were  generated  on  an  MS-DOS  386  PC  using  the  KSC  GO  software.) 

The  corresponding  confidence  functions  are  graphed  in  Figure  4-4.  In  case  1 
(represented  on  the  graph  by  squares  every  5th  data  point)  where  the  results  reflect 
zero  failures  in  100  tests  on  all  components,  we  are  90*^  confident  that  system 
survivability  exceeds  0.82.  We  have  zero  confidence  that  it  will  equal  or  exceed  0.95. 

For  the  200  test  case  (circles)  we  are  90%  confident  that  sy.stem  survivability 
exceeds  0.91.  We  are  15%  confident  that  it  equals  or  exceeds  0.95. 


Because  it  would  be  astronomically  expensive  to  have  tested  all  components 
(subsystems)  to  the  levels  reflected  in  Case  2  (200  tests  for  each  of  the  40 
communication  nodes  and  links),  system  decision  makers  need  to  know  how  to  optimize 
testing.  Consequently,  we  employed  the  KSC  GO  software  on  the  system  configuration 
of  Figure  4-2  and  identified  the  low-order  fault  sets  (see  Table  4-2).  In  the  HSA  system 
there  are  14  first-order  fault  sets,  two  third-order  fault  sets,  and  61  fourth-order  faults 
sets.  A  fault  set  is  a  set  of  components  or  elements  whose  simultaneous  failures  fail  the 
system.  Components  whose  single  failures  cause  system  failures  (first-orders)  are  most 
critical  to  system  success.  Components  in  all  other  fault  sets  are  redundant  in  increasing 
degrees  wiA  fault  set  order. 


'’1 


Figure  4-2.  KSC  GO  survivability  model  of  HSA  communications  network. 
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Figure  4-3.  HSA  communication  network  survivability  distributions. 
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Figure  4-4.  HSA  communication  network  survivability  confidence  functions. 
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With  this  fault  set  information,  decision  makers  could  prioritize  component  or 
subsystem  testing  to  those  in  low-order  fault  sets.  Alternatively,  they  could  change  the 
system  configuration  to  provide  additional  redundancy.  For  this  HSA  communication 
network,  if  only  the  14  first-order  components  have  increased  testing  performed,  say  to 
200  tests  with  no  failures  (resulting  in  beta  survivability  distributions  with  parameters 
200,1),  and  all  other  components  have  no  further  testing  but  remain  with  the  knowledge 
from  the  former  100  tests  with  no  failures  (beta  parameters  100,1),  the  system 
survivability  distribution  (triangles)  is  shown  in  Figure  4-5  along  with  the  other  two 
distributions.  The  corresponding  confidence  functions  are  portrayed  in  Figure  4-6. 

The  curve  with  every  5th  data  point  represented  by  triangles  reflects  optimized  testing 
of  only  system  elements  in  first-order  f^ault  sets.  It  overlies  the  case  where  all 
components  had  been  tested  200  times  (circles),  differing  only  in  the  random  draws. 

To  have  tested  all  the  other  components  so  extensively  would  have  been  a  significant 
waste  of  resources.  The  knowledge  of  their  performance  available  from  the  prior  100 
tests  in  this  environment  was  sufficient.  Indeed,  it  was  probably  excessive,  but  we  did 
not  perform  an  excursion  to  see  how  few  tests  would  have  been  required  on  all  but 
first-order  fault  set  components  to  maintain  the  system  survivability  distribution  at  the 
level  registered  in  the  200-sample  curves  of  Figures  4-3  and  4-5. 
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Figure  4-5.  Survivability  distributions  from  optimized  testing. 
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Figure  4-6.  Sur\'ivability  confidence  functions  from  optimized  testing. 
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Table  4-2.  Low  order  fault  sets  for  HSA  communication  network. 
HRST  ORDER  FAULT  SETS 

No.  Component 

1  Airborne  Command  Post  (ABCP) 

2  Ground-Based  Communication  Node  #4  =  NORAD 

3  Comm  Link  H  (Soft  Ground-Based  Early  Tracker  <-■>  NORAD) 

4  Soft  Ground-Based  Early  Tracker  (Soft  GBET) 

5  Comm  Link  M  (Ground-Based  Final  Tracker  (GBFT)  <->  NORAD) 

6  Hard  GBET 

7  Comm  Link  N  (On-Site  landline  to  NORAD) 

8  GBFT 

9  Comm  Link  Q  (On-Site  landline  to  GBFT) 

10  On-Site  Command  Center  (CC) 

1 1  Comm  Link  R1  (On-Site  CC  landline  to  missile  in  silo) 

12  Ground-Based  Exo-Interceptor  (GBEI)  Farm 

13  Comm  Link  R2  (On-Site  CC  RF  link  to  missile  in  flight) 

14  Missile 

SECOND  ORDER  FAULT  SETS 
None 

THIRD  ORDER  FAULT  SETS 

No.  Component  Component  Component 

1  SBLS#1  SBLS#2  SBLS#3 

2  SBMCT#1  SBMCT#2  SBMCT#3 


FOURTH  ORDER  FAULT  SETS 

No. 

Component 

Component 

Component 

Component 

1 

SBCN#1 

SBCN#2 

SBCN#3 

SBCN#4 

2 

SBCN#1 

SBCN#2 

SBCN#3 

P4 

3 

SBCN#1 

SBCN#2 

SBCN#3 

E4 

4 

SBCN#1 

SBCN#2 

SBCN#3 

14 

5 

SBCN#1 

SBCN#2 

SBCN#3 

K4 

NOTE:  The  naming  convention  for  these  communication  links  from  the  SBCNs 
to  the  ABCP  (Ei),  to  the  Soft  GBET  (li),  to  the  Hard  GBET  (Ki),  and  from  the  GBCNs 
to  NORAD  (Pi)  are  subscripted  with  the  communication  node  number. 


Table  4-2.  Low  order  fault  sets  for  HSA  communication  network  (Continued). 


FOURTH  ORDER  FAULT  SETS  (CQNT.) 


No. 

Component 

HWtllllUUIMIIH 

Component 

6 

SBCN#1 

SBCN#2 

P3 

SBCN#4 

7 

SBCN#1 

SBCN#2 

P3 

P4 

8 

SBCN#1 

SBCN#2 

SBCN#4 

£3 

9 

SBCN#1 

SBCN#2 

SBCN#4 

13 

10 

SBCN#1 

SBCN#2 

SBCN#4 

K3 

11 

SBCN#1 

SBCN#2 

£3 

E4 

12 

SBCN#1 

SBCN#2 

13 

14 

13 

SBCN#1 

SBCN#2 

K3 

K4 

14 

SBCN#1 

P2 

SBCN#3 

SBCN#4 

15 

SBCN#1 

P2 

SBCN#3 

SB<3I#4 

16 

SBCN#1 

P2 

P3 

SBCNM 

17 

SBCN#1 

P2 

P3 

P4 

18 

SBCN#l 

SBCN#3 

SBCN#4 

£2 

19 

SBCN#1 

SBCN#3 

SBCN#4 

12 

20 

SBCN#1 

SBCN#3 

SBCN#4 

K2 

21 

SBCN#1 

SBCN#3 

£2 

£4 

22 

SBCN#1 

SBCN#3 

12 

14 

23 

SBCN#1 

SBCN#3 

K2 

K4 

24 

SBCN#I 

SBCN#4 

£2 

£3 

25 

SBCN#1 

SBCN#4 

12 

13 

26 

SBCN#1 

SBCN#4 

K2 

K3 

27 

SBCN#1 

£2 

£3 

£4 

28 

SBCN#1 

12 

13 

14 

29 

SBCN#1 

K2 

K3 

K4 

30 

PI 

SBCN#2 

SBCN#3 

SBCN#4 

31 

PI 

SBCN#2 

SBCN#3 

P4 

32 

PI 

SBCN#2 

P3 

SBCN#4 

33 

PI 

SBCN#2 

P3 

P4 

34 

PI 

P2 

SBCN#3 

SBCN#4 

35 

PI 

P2 

SBCN#3 

P4 

36 

PI 

P2 

P3 

SBCN#4 

37 

PI 

P2 

P3 

P4 

38 

SBCN#2 

SBCN#3 

SBCN#4 

El 

39 

SBCN#2 

SBCN#3 

SBCN#4 

11 

40 

SBCN#2 

SBCN#3 

SBCN#4 

K1 
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Table  4-2.  Low  order  fault  sets  for  HSA  communication  network  (Continued). 


FOURTH  ORDER  FAULT  SETS  (CONT.) 


Component 

Component 

Component 

Component 

41 

SBCN#2 

SBCN#3 

El 

E4 

42 

SBCN#2 

SBCN#3 

11 

14 

43 

SBCN#2 

SBCN#3 

K1 

K4 

44 

SBCN#2 

SBCN#4 

El 

E3 

45 

SBCN#2 

SBCN#4 

11 

13 

46 

SBCN#2 

SBCN#4 

K1 

K3 

47 

SBCN#2 

El 

E3 

E4 

48 

SBCN#2 

11 

13 

14 

49 

SBCN#2 

K1 

K3 

K4 

50 

SBCN#3 

SBCN#4 

El 

E2 

51 

SBCN#3 

SBCN#4 

11 

12 

52 

SBCN#3 

SBCN#4 

K1 

K2 

53 

SBCN#3 

El 

E2 

E4 

54 

SBCN#3 

11 

12 

14 

55 

SBCN#3 

K1 

K3 

K4 

56 

SBCN#4 

El 

E2 

E3 

57 

SBCN#4 

11 

12 

13 

58 

SBCN#4 

K1 

K2 

K3 

59 

El 

E2 

E3 

E4 

60 

11 

12 

13 

14 

61 

K1 

K2 

K3 

K4 
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SECTION  5 


CONCLUSION 


This  paper  has  discussed  methods  for  developing  the  component  and  subsystem 
survivability  distributions  from  various  data  sources  that  are  necessary  to  develop  the 
survivability  distributions  for  higher  level  systems  which  cannot  be  tested.  Methods  for 
developing  the  fundamental  and  lower-level  distributions  based  upon  test  data, 
knowledge  of  physics,  and  engineering  judgment  have  been  presented.  The  relationship 
between  discrete  binomial  test  results  and  continuous  beta  distributions  was  shown.  The 
effects  of  increased  testing  were  explored  and  documented. 

The  procedure  for  developing  system-level  survivability  distributions  from 
lower-level  distributions  was  discussed  and  the  method  applied  to  two  simple 
configurations,  then  to  a  realistic  HSA  communication  network.  The  final  result,  a 
system-level  survivability  distribution  which  quantifies  system  survivability,  is  also  used 
to  place  confidence  bounds  on  system  survivability.  Developing  system-level 
survivability  distributions  for  varying  survivability  protocols  permits  direct  comparison 
of  the  results.  The  system  survivability  can  be  compared  at  a  fixed  confidence  level,  or 
the  confidence  levels  at  a  fixed  system  survivability  can  be  compared  to  measure  the 
merit  of  alternate  protocols. 
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